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Sample-to-sample fluctuations in real-network ensembles 
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Network modeling based on ensemble averages tacitly assumes that the networks meant to be 
modeled are typical in the ensemble. Previous research on network eigenvalues, which govern a 
range of dynamical phenomena, has shown that this is indeed the case for uncorrelated networks 
with minimum degree > 3. Here we focus on real networks, which generally have both structural 
correlations and low-degree nodes. We show that: (i) the ensemble distribution of the dynamically 
most important eigenvalues can be not only broad and far apart from the real eigenvalue but also 
highly structured, often with a multimodal rather than bell-shaped form; (ii) these interesting 
properties are found to be due to low-degree nodes, mainly those with degree < 3, and network 
communities, which is a common form of structural correlation found in real networks. In addition 
to having implications for ensemble-based approaches, this shows that low-degree nodes may have a 
stronger influence on collective dynamics than previously anticipated from the study of computer- 
generated networks. 

PACS numbers: 05.50.-|-q, 05.10.-a, 87.18.Sn, 89.75.-k 



In the network modeling of collective behavior, 
while one can analyze each network individually, 
the ideal is to draw general conclusions that can 
apply to an entire class of networks. However, 
one must ascertain under what conditions such 
results apply. These can be determined by con- 
sidering ensembles, and many results have al- 
ready been established for ensembles of random 
networks. It remains to be addressed, though, 
the extent to which random ensembles are rep- 
resentative of real networks. Given a real net- 
work Af, one can define an associated ensemble 
£js/{pi, ...^Pn) as the set of all possible realizations 
of the network in which one or more parame- 
ters, represented by pi,...,p„, are preserved. In 
one extreme there is £j^{N), where one only fixes 
the number of nodes, so that the real net- 
work could be very dissimilar from most ensem- 
ble elements. In the opposite extreme there is 
£j^{pi,P2, ■■■), where all possible parameters are 
fixed, but this is equivalent to studying the orig- 
inal network. An important goal is to restrict as 
few of the parameters as possible while still cap- 
turing the essential features of the real network. 
This is fundamental for the study of collective 
dynamics because in many network processes, in- 
cluding diffusion, consensus phenomena, and syn- 
chronization, the influence of the network struc- 
ture is determined by the eigenvalues of a cou- 
pling matrix, which exhibit a rather convoluted 
dependence on simple network properties. Focus- 
ing primarily on the ensemble £j^{N, {ki}), which 



preserves the number of nodes and the degree se- 
quence {ki}, we study the ensemble distribution 
of individual eigenvalues and the conditions under 
which the ensemble networks are representative 
of the real network. 



I. INTRODUCTION 

The complexity of networked systems is frequently 
studied via empirical observation that different real net- 
works share common structural properties P, Q • Such 
common properties have implications for network dy- 
namical phenomena, which are often believed not to de- 
pend strongly on the specific network under consideration 
U Q . Ensembles of networks designed to reproduce com- 
mon properties, including heterogeneous degree distribu- 
tion and certain level of randomness, have been widely 
used in statistical physics studies of networks ^ JJ . This 
provides a convenient tool to address general and possi- 
bly universal aspects of network phenomena [1, Q . The 
inverse approach, focused on building a precise model to 
reproduce an observed network dynamical phenomenon, 
remains challenging in general. But to what extent can 
ensemble studies provide information about the proper- 
ties of individual networks? 

Previous research focused on the eigenvalues of cou- 
pling matrices has shown that the ensemble distribution 
of the eigenvalues converge to peaked, bell-shaped func- 
tions as the number of nodes in the network is increased 
6]. This result was established for random uncorrelated 
networks of given degree distributions with minimum de- 
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TABLE L Real networks considered in this study. The columns show basic properties of the real networks as well as the 
extreme eigenvalues, the corresponding spectral positions A^, and the standard deviations ax of the random ensembles (see 
Sec. IIII[) . The basic properties are the number of nodes A^, the average degree (k), the maximum degree fciv. The minimum 
degree ki is one for all networks. In each case, we focus on the largest connected component of the real network. In the fc-core 
test with k — 2 and fc = 3, the percentage of remaining nodes is 52 and qs, respectively. A summary of the 3-core tests is 
given as superscripted symbols, where f indicates an originally structured eigenvalue distribution that becomes unimodal in 
the 3-core networks while those with * are still structured in the 3-core networks. No symbol is specified for non-structured 
distributions in the original random ensemble. 



gree > 3, showing that under these conditions the eigen- 
values of large networks are well represented by ensem- 
ble averages. However, the sample-to-sample fluctuations 
across the ensemble, which determine the quality of en- 
semble averages, may change when these conditions are 
relaxed. In particular, it has been shown that having 
a finite fraction of nodes with degree one or two can 
fundamentally alter the value of individual eigenvalues 
in the thermodynamic limit {tJ. These studies are in- 
sightful, and so are those based on the analysis of in- 
dividual computer-generated networks [1]. Yet, they do 
not directly address the properties of real networks, since 
empirically observed networks have finite size, are struc- 
turally correlated, and usually include low-degree nodes. 

The central question that we raise in this context is 
how typical a real network is in an associated ensem- 
ble that preserves a selection of its local structural prop- 
erties, such as the degree sequence. Here, we address 
this question by sampling the associated ensemble and 
contrasting the relevant eigenvalues of the ensemble el- 
ements with those of the real network used to generate 
it. We focus on the extreme (largest and/or smallest 



nonzero) eigenvalues of coupling matrices, because they 
encapsulate the structural network attributes that govern 
a number of network dynamical processes, such as syn- 
chronization [9l-[ll|. diffusion and epidemic spread- 
ing [H, [l^!- The results are, therefore, representative of 
the impact that ensemble-based approaches have on the 
study of network dynamics in general. 

The article is organized as follows. In Sec. |TT1 we in- 
troduce and motivate the eigenvalues as well as the real 
networks we consider. In Sec. IIIIl we show that in some 
cases the real network is well represented by the ensemble 
distribution, but in many other cases the ensemble distri- 
bution deviates significantly from the real network. We 
also show that the ensemble distributions are often highly 
structured, exhibiting multiple peaks. In Sec. IIV[ we ex- 
plore the properties of /c-cores and network eigenvectors 
to elaborate on the origin of these structures. We also 
discuss the impact of community structures to rationalize 
the observed deviations of the real eigenvalues from the 
ensemble distributions. Finally, our concluding remarks 
are presented in Sec. |Vl 
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protein interaction coautlnorsliip power grid 




FIG. 1: Ensemble distributions of extreme eigenvalues for a selection of real networks: the protein interaction network, the 
coauthorship network, and the power grid (Table |l]). The eigenvalues are (a) the smallest nonzero eigenvalue of the Laplacian, 
(b) the largest eigenvalue of the Laplacian, (c) the largest eigenvalue of the adjacency matrix, (d) the smallest nonzero eigenvalue 
of the normalized Laplacian, and (e) the largest eigenvalue of the normalized Laplacian. Tilde is used to indicate that the 
distributions are rescaled as x = {x — {x))/gx to have zero averages and unit variances, where (a;) is the average and ax is 
the standard deviation of the original distribution P{x) of an eigenvalue x. The arrows (top) indicate the positions of the real 
extreme eigenvalues that lie within the range of the plot, clearly showing that in most cases the real network is not "typical" 
within the random ensemble. 



II. EIGENVALUES AND EMPIRICAL 
NETWORKS 



We focus on the extreme eigenvalues of three connec- 
tivity matrices that play important roles in many dy- 
namical processes: the adjacency matrix, the Laplacian 
matrix, and the normalized Laplacian matrix. The adja- 
cency matrix is defined as A — {Aij), where Aij = 1 
if nodes i and j are connected and Aij — other- 
wise. The Laplacian L and the normalized Laplacian 
L are defined as D — A and D~^L, respectively, where 
D = diagjfci, . . . , ki\[} is the diagonal matrix of degrees. 
For connected undirected networks, as considered here, 
the smallest eigenvalue of the matrix L is zero and all 
the others are strictly positive. The same holds true for 
the matrix L. These coupling matrices have broad sig- 
nificance for the study of network dynamics. Synchro- 
nization of diffusively coupled oscillators, for example, 
is often determined by the largest eigenvalue (A at) and 
the smallest nonzero eigenvalue (A2) of the Laplacian L 
[1, mi • The relaxation time in diffusion processes is de- 
termined by the corresponding eigenvalues of the nor- 
malized Laplacian L , which we denote ^ n and U2 



respectively. The threshold for epidemic spreading [14 1 
and the dynamic range in excitable systems ^16] , on the 
other hand, are largely influenced by the largest eigen- 
value (A^v) of the adjacency matrix A. Motivated by 
these and other dynamical applications in which extreme 
eigenvalues are found to play a role, the eigenvalues of 
interest in this study are A2, Xn, ^J■2, fJ-N, and Ajy. For 
notational convenience, the nodes are labeled in increas- 
ing order of their degrees ki, such that fci < . . . < fcjv- 

We consider twelve real networks from various do- 
mains, including technology, biological sciences, and so- 
ciology (soj . which span a wide range of sizes and link 
densities (Table H]). For each of these networks we de- 



fine the associated random ensemble £^{N, {ki}), which 
preserves the number of nodes and the degree sequence, 
and we study the properties of the extreme eigenvalues in 
this ensemble. This is implemented computationally by 
randomly selecting independent network realizations in 
the ensemble. The ensemble networks are generated us- 
ing the link- rewiring algorithm [l7j . which randomizes a 
network while preserving the given degree sequence {fc^}. 
In this construction, all links are regarded as undirected, 
self-links and duplicated links are forbidden, and the net- 
works are required to remain connected. Two realizations 
become statistically independent through (X^i^O^ lii^k 
rewiring operations. Our statistics are based on 10, 000 
independent network realizations for each ensemble. The 
finite number of realizations leads to a discrete set of the 
extreme eigenvalues {xi}, so that the distribution can 
be formally written as P{x) oc '^iS{x — Xi). To avoid 
artifacts associated with the discreteness of the distribu- 
tion, the Dirac delta 6(x) is approximated as a Gaussian 
distribution with a small variance. 



III. EIGENVALUE ENSEMBLE 
DISTRIBUTIONS 

Figure [T] shows the ensemble distributions of the 
extreme eigenvalues for a selection of disparate real 
networks — a protein-interaction network, a scientific 
coauthorship network, and a power-grid network. Some 
of the distributions, such as the largest eigenvalue of 
the adjacency matrix for all three networks (Fig. [TJc)) 
and the largest normalized Laplacian eigenvalue for the 
power- grid network (Fig. [ije)), exhibit relatively well- 
defined bell-shaped distributions. Others, however, ex- 
hibit pronounced deviations, including secondary peaks. 
This is the case, for example, for the extreme eigenvalues 
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FIG. 2: Effect of low-degree nodes on ensemble distributions 
for the Internet and the network of political blogs. The distri- 
butions correspond to (a, c) the smallest nonzero eigenvalue 
of the Laplacian and (b, d) the smallest nonzero eigenvalue 
of the normalized Laplacian. The distribution for the largest 
eigenvalue of the normalized Laplacian is essentially undis- 
tinguishable from the latter, and is not shown. Dotted lines 
indicate the distributions associated with the original real net- 
works and continuous lines indicate the distributions for the 
corresponding 3-cores of the networks. All distributions are 
rescaled as in Fig. [1] In most cases the ensemble distribu- 
tions for the 3-cores are significantly smoother than for the 
original networks, indicating that at least part of the observed 
structures is due to low-degree nodes. 

of the Laplacian (Fig.[lja)-(b)) and normalized Laplacian 
(Fig. [lld)-(e)) of the coauthorship network. Additional 
information is provided by considering the position of the 
corresponding eigenvalues of the real networks relative to 
these ensemble distributions. Surprisingly, in most cases 
the eigenvalue of the real network deviates significantly 
from the ensemble average. A notable exception is the 
largest Laplacian eigenvalue of the coauthorship network 
(Fig.[TJb)), where the real-network is well approximated 
by the ensemble average. Perhaps even more surprisingly, 
there appears to be essentially no relation between the 
bell-shaped form of the distribution and the quality of 
this approximation. For example, the largest Laplacian 
eigenvalue of the protein-interaction network does not lie 
within the range of the plot despite the bell-shaped form 
of the ensemble distribution (Fig. Hfb)), and the same is 
true for the largest eigenvalue of the adjacency matrix of 
all three networks (Fig. [ijc)). 

To quantify this deviation, we consider the spectral 
position of the real- network eigenvalues, which we define 
as 

A, = (x(") - 

where x represents the eigenvalue. Here, the superscript 



(0) indicates the eigenvalue of the real network, (a;) is the 
average of the eigenvalue in the associated ensemble, and 
ax is the standard deviation of the ensemble distribution, 
P{x). This simple quantity provides a meaningful mea- 
sure for the extent to which a real eigenvalue deviates 
from the ensemble average, which is expressed in units of 
the standard deviation. 

Table |T] summarizes the statistics for all 12 empiri- 
cal networks considered. Several properties of the real- 
network eigenvalues, such as the approximate symmetry 
between /12 and 2 — uw, are in good agreement with the- 
oretical predictions Q. However, in most cases the value 
of |Aa;| is larger than unity, and in many cases it is much 
larger, confirming that real networks are often not typi- 
cal in their own associated ensembles. That is, in terms 
of the extreme eigenvalues considered here, the real net- 
works are often significantly different from the majority 
of the ensemble networks. Another interesting aspect of 
the results shown in Table U is that this deviation is not 
necessarily due to large deviations in absolute values. For 
all real networks, A at is just slightly larger than kN + ^, 
as predicted theoretically for uncorrelated networks 
The ensemble distributions are also peaked close to this 
point (at a distance < 10~^ x A^-* for all networks), but 
because these distributions are very narrow, a small devi- 
ation in absolute value tends to correspond to a relatively 
large number in units of standard deviation. 

On the other hand, several cases exhibit a structured 
rather than bell-shaped distribution, which cannot be an- 
ticipated from these theoretical results. This is so for the 
smallest nonzero eigenvalues of the Laplacian and of the 
normalized Laplacian (the same is true also for the largest 
eigenvalue of the normalized Laplacian, due to symmetry 
mentioned above, which is nearly exact for the ensemble 
networks). These cases are marked with superscripted 
symbols in Table HI For example, for the Internet net- 
work and the network of political blogs, the ensemble 
distributions of these eigenvalues exhibit multiple large 
and relatively distant peaks (Fig. [2]). The next question 
concerns the origin of these abnormal fluctuations. We 
hypothesize that the main cause of the fluctuations is the 
presence of poorly connected nodes and/or poorly con- 
nected groups of nodes. The basis for this hypothesis is 
that the smallest nonzero eigenvalues of Laplacian-like 
matrices are known to be influenced by low-degree nodes 
^ as well as by communities of densely connected nodes 
that are sparsely connected with the rest of the network 
p^ . Next, we study the extent to which these factors 
can generate the observed fluctuations in the ensemble 
distributions and deviations between the real eigenvalues 
and the ensemble averages. 



IV. ROLE OF LOW DEGREES AND 
ADDITIONAL NETWORK STRUCTURE 

To probe the influence of low-degree nodes, we explore 
the k-cove organization of the networks [l^. The fc-core 
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FIG. 3: Example of network structure contributing to the 
fluctuations in the distribution -P(A2) associated with the net- 
work of political blogs. The subgraph shown highlights the 
relevant part of an ensemble network at the second peak (left 
to right) of the 3-core random ensemble in Fig. EJc). The 
positions of the peaks in the distribution can be estimated by 
considering only the submatrix of the Laplacian that includes 
the links between the two low-degree nodes (a and /3) and 
their neighbors (solid lines). 



of a network is the largest connected subnetwork in which 
all nodes have degree at least k. Given a real network 
Af, we extract the k-core{Af) and then generate another 
random ensemble £k-coTe(Af){N' i {k'i}), where N' is the 
number of nodes and {/c^} is the degree sequence of the 
fc-core. The first case of interest corresponds to 2-cores, 
where the minimum degree in the new network is 2. For 
the networks considered in this study, the 2-cores are 
found to exhibit spiky ensemble distributions compara- 
ble to those of the original networks. Our analysis of 3- 
cores, on the other hand, reveals very different behavior: 
as illustrated in Fig. [Ha)-(b) for the Internet network, 
the distributions of the smallest nonzero eigenvalues of 
the Laplacian and normalized Laplacian become signifi- 
cantly smoother and close to bell-shaped curves. Similar 
smoothening of ensemble distributions for the 3-cores is 
observed for the smallest nonzero eigenvalues of most net- 
works with fluctuations, as summarized in Table U] This 
confirms that the fluctuations in A2 and ^2 (and also 11 n) 
are mainly due to nodes with degree 1 and 2. 

We do not systematically consider fc-cores for higher k 
because the loss of statistics due to the reduction in the 
size of the network may compete with the effect of remov- 
ing low-degree nodes. However, there are cases where the 
fluctuations still appear in the ensemble distributions of 
the 3-cores, such as for the smallest nonzero eigenvalue of 
the Laplacian in the network of political blogs (Fig.l^Ic)). 
This suggests that other types of network structures are 
affecting some of the ensemble distributions. 

In the particular case of the network of political blogs, 
we find that there is a relationship between the distri- 
bution of the eigenvalue A2 and subgraph structures in- 
volving low-degree nodes. Specifically, at the peaks of 
the distribution of A2, the components of the associated 
eigenvector are dominantly large for a certain pair of 



low-degree nodes that are directly connected and whose 
other neighbors have considerably larger degrees. Figure 
[3] highlights this structure in an ensemble element that 
is at one of the peaks of P{\2)- While different ensem- 
ble realization will have different such nodes connected 
to each other, we can show that the impact they have on 
the fluctuations of the eigenvalue distributions is mainly 
determined by their degrees. Overall, there exist only 
few links between low-degree nodes in a chosen network 
realization, but the frequency with which an ensemble 
network exhibits at least one such subgraph is relatively 
high. This is likely related to the fact that the network of 
political blogs has a very long-tailed degree distribution, 
which along with the constraints of not having self-links 
and duplicated links, leads to the relatively frequent oc- 
currence of such subgraphs in the ensemble. Moreover, 
the observation of these subgraphs allows us to estimate 
analytically the positions of the peaks of P(A2) for this 
network. 

Assuming that only one such subgraph contributes to 
the eigenvector of A2, we can project the full Laplacian 
onto the reduced space that consists of two low-degree 
nodes, a and /3, and their neighbors. Accordingly, by 
writing the eigenvalue equation (L — \2l)y = explicitly 
and noting that the degrees of the neighbors of a and 
/3 are much larger than A2, we derive the approximate 
expression 

{ka + kp) f kg - ki3\'^ 
A2«^ Vl^~J 

where eap = fap Yl' is a small number, with fap = 
1 -I- (ka + kp)/ a/ {ka — k^y + 4 and the summation taken 
over the neighbors of a and /3. Even with the rough ap- 
proximation €01/3 = 0, the calculated A2 shows remarkable 
agreement with the peaks of the eigenvalue distribution 
observed in the network of political blogs. In the random 
ensemble of the original network, the low-degree combi- 
nations provide A2 ~ 0.38, 0.58, 0.69 for kg — 1 and 
kj3 = 2, 3, 4, respectively, which are in precise agree- 
ment with the observed major peaks of P(A2). Even 
though eap increases with ka and fc^, the equation above 
also provides very good estimations for the peaks in the 
ensemble of 3-cores. The estimations for ka = 3 and 
kj3 = 3, 4, 5 are 2.00, 2.38, 2.58, respectively, which are 
very close to the major peaks observed at A2 — 1.96, 
2.33, 2.53. While this eigenvector analysis applies to the 
network of political blogs, the multimodal distributions 
found in the other network ensembles may be determined 
by other network structures. But we suggest that even in 
such cases, the peaks in the eigenvalue distributions are 
likely to be associated with patterns of subgraph struc- 
tures that can take a relatively small number of forms. 

Another important question concern the origin of the 
often large deviation of the real eigenvalues from the en- 
semble averages even when the ensemble distributions 
are approximately bell-shaped. We propose that this 
is caused by the presence of structures in the real net- 
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works that would correspond to rare events in the ran- 
dom ensemble. An example of a particularly important 
such structure that can lead to large deviations from 
the ensemble averages is shown in Fig. Sl^a); a densely 
connected community in the word network (Moby The- 
saurus). This cluster dominantly contributes to the 
eigenvectors corresponding to the extreme eigenvalues A2 
and ii2- In the cluster, eight words closely related to 
'guitar' form a fully connected subnetwork; this cluster 
is connected to the rest of the network by the link be- 
tween 'lute', which has two very distinct meanings, and 
'adhesive tape'. This type of weakly connected network 
structure can cause the smallest nonzero eigenvalue of 
the Laplacian to be very small. This explains the small 
values of A2 and ^2 found in the word network. 

To exemplify this effect, consider the model network 
with two communities shown in Fig.|4l^b). The eigenvalue 
A2 for this network is 0.045. For the associated random 
ensemble, the ensemble average of this eigenvalue is 0.36, 
which is substantially larger than the eigenvalue of the 
initial network. This is so mainly because the ensemble 
does not preserve the community structure. Indeed, a 
community-preserving ensemble can be created in which 
the two communities are separately randomized and then 
linked together by a single link, and within this ensemble 
the average of A2 is very close to the eigenvalue of the 
original network (difference < 10"'^). 

Because this type of structure is expected to be present 
to some extent in most real networks, the smallest 
nonzero eigenvalues of the Laplacian and normalized 
Laplacian of real networks are generally expected to be 
smaller than the corresponding random-ensemble aver- 
ages. This explains the negative spectral position in most 
cases shown in Table HI This, in turn, is consistent with 
the positive spectral position exhibited by the largest 
eigenvalue of the normalized Laplacian for the networks 
we consider. Community structures are expected to also 
impact other eigenvalues, such as the eigenvalues of the 
adjacency matrix, which have been used to design algo- 
rithms for community detection [20|. The deviation of 
the eigenvalues from the ensemble distributions can also 
be partially determined by different network structures 
that set them apart from random, such as clustering, de- 
gree correlations, and assortative mixing pll. [23|. Disas- 
sortative networks, for example, are known to exhibit en- 
hanced synchronization properties precisely because they 
have smaller ratio Xn/M than their random counterparts 

(nil. 

V. FINAL REMARKS 

The fluctuations in the ensemble distributions ob- 
served in this study have important implications. On one 
hand, we have provided evidence that these structures are 
largely due to low-degree nodes in the network. On the 



other hand, it follows that these fluctuations cannot be 
ignored in the estimation and interpretation of ensemble 




FIG. 4: Network structure effecting the extreme eigenvalues 
A2 and ^2- (a) Community structure found in the word net- 
work, (b) Model network with 50 nodes, consisting of two 
clusters connected to each other by a single link. The ran- 
domization of the whole network without preserving the clus- 
ters tends to increase the smallest nonzero eigenvalues of the 
Laplacian and normalized Laplacian. 



averages associated with networks that have low-degree 
nodes, which is the rule and not the exception among 
real networks. Moreover, because these distributions can 
be broad, if one samples networks from the ensemble, 
the eigenvalue fluctuations from sample to sample will 
frequently be large. Another interesting aspect of this 
problem is that low degrees alone may not explain all 
the observed fluctuations and that, even for bell-shaped 
distributions the real eigenvalue of interest often devi- 
ates significantly from the ensemble average. For some of 
the eigenvalues, this deviation can be mainly attributed 
to the presence of community structures in the network. 
This, in turn, suggests that a promising approach would 
be to incorporate the community structures in the def- 
inition of the ensemble, as additional properties {pi\ in 
^AfiN, {ki},{pi}). There are in fact models to generate 
random network ensembles with a large number of pre- 
served properties, including communities ^25j and distri- 
butions of subgraphs [2^. An important challenge for 
future research is to address the properties of real net- 
work with the framework provided by such models. 
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